NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

SQLucid: Grounding Natural Language Database Queries with Interactive Explanations

https://doi.org/10.1145/3654777.3676368

Tian, Yuan; Kummerfeld, Jonathan K; Li, Toby Jia-Jun; Zhang, Tianyi (October 2024, ACM)

Though recent advances in machine learning have led to significant improvements in natural language interfaces for databases, the accuracy and reliability of these systems remain limited, especially in high-stakes domains. This paper introduces SQLucid, a novel user interface that bridges the gap between non-expert users and complex database querying processes. SQLucid addresses existing limitations by integrating visual correspondence, intermediate query results, and editable step-by-step SQL explanations in natural language to facilitate user understanding and engagement. This unique blend of features empowers users to understand and refine SQL queries easily and precisely. Two user studies and one quantitative experiment were conducted to validate SQLucid’s effectiveness, showing significant improvement in task completion accuracy and user confidence compared to existing interfaces. Our code is available at https://github.com/magic-YuanTian/SQLucid.
more » « less
Full Text Available
An AI-Resilient Text Rendering Technique for Reading and Skimming Documents

https://doi.org/10.1145/3613904.3642699

Gu, Ziwei; Arawjo, Ian; Li, Kenneth; Kummerfeld, Jonathan K; Glassman, Elena L (May 2024, ACM)

Readers find text difficult to consume for many reasons. Summarization can address some of these difficulties, but introduce others, such as omitting, misrepresenting, or hallucinating information, which can be hard for a reader to notice. One approach to addressing this problem is to instead modify how the original text is rendered to make important information more salient. We introduce Grammar-Preserving Text Saliency Modulation (GP-TSM), a text rendering method with a novel means of identifying what to de-emphasize. Specifically, GP-TSM uses a recursive sentence compression method to identify successive levels of detail beyond the core meaning of a passage, which are de-emphasized by rendering words in successively lighter but still legible gray text. In a lab study (n=18), participants preferred GP-TSM over pre-existing word-level text rendering methods and were able to answer GRE reading comprehension questions more efficiently.
more » « less
Full Text Available
Supporting Sensemaking of Large Language Model Outputs at Scale

https://doi.org/10.1145/3613904.3642139

Gero, Katy Ilonka; Swoopes, Chelse; Gu, Ziwei; Kummerfeld, Jonathan K; Glassman, Elena L (May 2024, ACM)

Full Text Available
A Comparative Multidimensional Analysis of Empathetic Systems

Lee, Andrew; Kummerfeld, Jonathan; An, Larry; Mihalcea, Rada (March 2024, Association for Computational Linguistics)

Full Text Available
Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations

https://doi.org/10.18653/v1/2023.emnlp-main.1004

Tian, Yuan; Zhang, Zheng; Ning, Zheng; Li, Toby; Kummerfeld, Jonathan; Zhang, Tianyi (December 2023, Association for Computational Linguistics)

Relational databases play an important role in business, science, and more. However, many users cannot fully unleash the analytical power of relational databases, because they are not familiar with database languages such as SQL. Many techniques have been proposed to automatically generate SQL from natural language, but they suffer from two issues: (1) they still make many mistakes, particularly for complex queries, and (2) they do not provide a flexible way for non-expert users to validate and refine incorrect queries. To address these issues, we introduce a new interaction mechanism that allows users to directly edit a step-by-step explanation of a query to fix errors. Our experiments on multiple datasets, as well as a user study with 24 participants, demonstrate that our approach can achieve better performance than multiple SOTA approaches.
more » « less
Full Text Available
Analyzing the Surprising Variability in Word Embedding Stability Across Languages

https://doi.org/10.18653/v1/2021.emnlp-main.476

Burdick, Laura; Kummerfeld, Jonathan; Mihalcea, Rada (October 2021, Proceedings of the Empirical Conference on Natural Language Processing)

Full Text Available
Exploring the Value of Personalized Word Embeddings

https://doi.org/10.18653/v1/2020.coling-main.604

Welch, Charles; Kummerfeld, Jonathan; Pérez-Rosas, Verónica; Mihalcea, Rada (December 2020, Proceedings of the International Conference on Computational Linguistics)

Full Text Available
Compositional Demographic Word Embeddings

https://doi.org/10.18653/v1/2020.emnlp-main.334

Welch, Charles; Kummerfeld, Jonathan; Pérez-Rosas, Verónica; Mihalcea, Rada (November 2020, Conference on Empirical Methods in Natural Language Processing)

Full Text Available
Learning from Personal Longitudinal Dialog Data

https://doi.org/10.1109/MIS.2019.2916965

Welch, Charles; Perez-Rosas, Veronica; Kummerfeld, Jonathan; Mihalcea, Rada (July 2019, IEEE intelligent systems)

We explore the use of longitudinal dialog data for two dialog prediction tasks: next message prediction and response time prediction. We show that a neural model using personal data that leverages a combination of message content, style matching, time features, and speaker attributes leads to the best results for both tasks, with error rate reductions of up to 15\% compared to a classifier that relies exclusively on message content and to a classifier that does not use personal data.
more » « less
Full Text Available
Look Who's Talking: Inferring Speaker Attributes from Personal Longitudinal Dialog

Welch, Charles; Perez-Rosas, Veronica; Kummerfeld, Jonathan; Mihalcea, Rada (April 2019, Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing))

We examine a large dialog corpus obtained from the conversation history of a single individual with 104 conversation partners. The corpus consists of half a million instant messages, across several messaging platforms. We focus our analyses on seven speaker attributes, each of which partitions the set of speakers, namely: gender; relative age; family member; romantic partner; classmate; co-worker; and native to the same country. In addition to the content of the messages, we examine conversational aspects such as the time messages are sent, messaging frequency, psycholinguistic word categories, linguistic mirroring, and graph-based features reflecting how people in the corpus mention each other. We present two sets of experiments predicting each attribute using (1) short context windows; and (2) a larger set of messages. We find that using all features leads to gains of 9-14% over using message text only.
more » « less
Full Text Available

Search for: All records